Rough Sets and Confidence Attribute Bagging for Chinese Architectural Document Categorization

نویسندگان

  • Xiang Zhang
  • Changhua Li
  • Lili Dong
  • Na Ye
چکیده

Aiming at the problems of the traditional feature selection methods that threshold filtering loses a lot of effective architectural information and the shortcoming of Bagging algorithm that weaker classifiers of Bagging have the same weights to improve the performance of Chinese architectural document categorization, a new algorithm based on Rough set and Confidence Attribute Bagging is proposed for Chinese architectural document categorization. Rough sets is used to feature selection. First the cores of attributes are found by discernibility matrix and one of the cores is regarded as the start point. Then attributes’ significance and dependency are used as the heuristic information to do feature selection. A Chinese architectural document classifier is designed by Confidence Attribute Bagging algorithm. The voting weights of weaker classifiers are gained by their result and the stronger classifier result is attained by weaker classifiers voting. The algorithm is applied in Attribute Bagging algorithm to design a classifier. The experimental results show that the novel method is not only easy to implement but can effectively reduce the dimensional space, and improve the accuracy of classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just ...

متن کامل

Fuzzy-rough attribute reduction with application to web categorization

Due to the explosive growth of electronically stored information, automatic methods must be developed to aid users in maintaining and using this abundance of information e+ectively. In particular, the sheer volume of redundancy present must be dealt with, leaving only the information-rich data to be processed. This paper presents a novel approach, based on an integrated use of fuzzy and rough s...

متن کامل

Multiple Sets of Rules for Text Categorization

This paper concerns how multiple sets of rules can be generated using a rough sets-based inductive learning method and how they can be combined for text categorization by using Dempster’s rule of combination. We first propose a boosting-like technique for generating multiple sets of rules based on rough set theory, and then model outcomes inferred from rules as pieces of evidence. The various e...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

A Comparative Study on Chinese Text Categorization Methods

This paper reports our comparative evaluation of three machine learning methods on Chinese text categorization. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been done on Chinese text categorization. Based on a re-constructed People’s Daily corpus, a series of controlled experiments evaluate three machine learning methods, namely k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JSW

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011